th coagent
Coagent Networks: Generalized and Scaled
Kostas, James E., Jordan, Scott M., Chandak, Yash, Theocharous, Georgios, Gupta, Dhawal, White, Martha, da Silva, Bruno Castro, Thomas, Philip S.
Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpropagation-based deep learning (BDL) that overcomes some of backpropagation's main limitations. For example, coagent networks can compute different parts of the network \emph{asynchronously} (at different rates or at different times), can incorporate non-differentiable components that cannot be used with backpropagation, and can explore at levels higher than their action spaces (that is, they can be designed as hierarchical networks for exploration and/or temporal abstraction). However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches. This work generalizes the coagent theory and learning rules provided by previous works; this generalization provides more flexibility for network architecture design within the coagent framework. This work also studies one of the chief disadvantages of coagent networks: high variance updates for networks that have many coagents and do not use backpropagation. We show that a coagent algorithm with a policy network that does not use backpropagation can scale to a challenging RL domain with a high-dimensional state and action space (the MuJoCo Ant environment), learning reasonable (although not state-of-the-art) policies. These contributions motivate and provide a more general theoretical foundation for future work that studies coagent networks.
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Asia > Middle East > Jordan (0.04)
Reinforcement Learning Without Backpropagation or a Clock
Kostas, James, Nota, Chris, Thomas, Philip S.
Reinforcement learning (RL) algorithms share qualitative similarities with the algorithms implemented byanimal brains. However, there remain clear differences between these two types of algorithms. For example, while RL algorithms using artificial neural networks require information to flow backwards through the network via the backpropagation algorithm, there is currently debate about whether this is feasible in biological neural implementations (Werbos and Davis, 2016). Policy gradient coagent networks (PGCNs) are a class of RL algorithms that were introduced to remove this possibly biologically implausible property of RL algorithms--they use artificial neural networks but do not use the backpropagation algorithm (Thomas, 2011). Since their introduction, PGCN algorithms have proven to be not only a possible improvement in biological plausibility, but a practical tool for improving RL agents. They were used to solve RL problems with high-dimensional action spaces (Thomas and Barto, 2012), are the RL precursor to the more general stochastic computation graphs (Schulman et al., 2015), and, as we will show in this paper, generalize the recently proposed option-critic architecture (Bacon et al., 2017), while drastically simplifying key derivations.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Asia > Malaysia (0.04)